13 research outputs found

    Fighting Authorship Linkability with Crowdsourcing

    Full text link
    Massive amounts of contributed content -- including traditional literature, blogs, music, videos, reviews and tweets -- are available on the Internet today, with authors numbering in many millions. Textual information, such as product or service reviews, is an important and increasingly popular type of content that is being used as a foundation of many trendy community-based reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown that, due partly to their specialized/topical nature, sets of reviews authored by the same person are readily linkable based on simple stylometric features. In practice, this means that individuals who author more than a few reviews under different accounts (whether within one site or across multiple sites) can be linked, which represents a significant loss of privacy. In this paper, we start by showing that the problem is actually worse than previously believed. We then explore ways to mitigate authorship linkability in community-based reviewing. We first attempt to harness the global power of crowdsourcing by engaging random strangers into the process of re-writing reviews. As our empirical results (obtained from Amazon Mechanical Turk) clearly demonstrate, crowdsourcing yields impressively sensible reviews that reflect sufficiently different stylometric characteristics such that prior stylometric linkability techniques become largely ineffective. We also consider using machine translation to automatically re-write reviews. Contrary to what was previously believed, our results show that translation decreases authorship linkability as the number of intermediate languages grows. Finally, we explore the combination of crowdsourcing and machine translation and report on the results

    Exploring linkability of user reviews

    No full text
    Large numbers of people all over the world read and contribute to various review sites. Many contributors are understandably concerned about privacy in general and, specifically, about linkability of their reviews (and accounts) across multiple review sites. In this paper, we study linkability of communitybased reviewing and try to answer the question: to what extent are ”anonymous ” reviews linkable, i.e., highly likely authored by the same contributor? Based on a very large set of reviews from one very popular site (Yelp), we show that a high percentage of ostensibly anonymous reviews can be accurately linked to their authors. This is despite the fact that we use very simple models and equally simple features set. Our study suggests that contributors reliably expose their identities in reviews. This has important implications for cross-referencing accounts between different review sites. Also, techniques used in our study could be adopted by review sites to give contributors feedback about linkability of their reviews.

    Optimizing Bi-Directional Low-Latency Communication in Named Data Networking

    No full text
    Content-Centric Networking (CCN) is an alternative to today’s Internet IP-style packet-switched host-centric networking. One key feature of CCN is its focus on content distribution, which dominates current Internet traffic and which is not well-served by IP. Named Data Networking (NDN) is an instance of CCN; it is an on-going research effort aiming to design and develop a full-blown candidate future Internet architecture. Although NDN’s emphasizes content distribution, it must also support other types of traffic, such as conferencing (audio, video) as well as more historical applications, such as remote login and file transfer. However, suitability of NDN for applications that are not obviously or primarily content-centric. We believe that such applications are not going away any time soon. In this paper, we explore NDN in the context of a class of applications that involve lowlatency bi-directional (point-to-point) communication. Specifically, we propose a few architectural amendments to NDN that provide significantly better throughput and lower latency for this class of applications by reducing routing and forwarding costs. The proposed approach is validated via experiments. 1
    corecore